Gamma distribution#

The Gamma distribution is a flexible continuous distribution on positive real numbers. It’s a natural model for waiting times, lifetimes, and other positive, right-skewed quantities.


1) Title & classification#

Item

Value

Name

Gamma (gamma)

Type

Continuous

Support

\(x \in (0,\infty)\)

Parameters (common)

shape \(\alpha>0\), scale \(\theta>0\)

Alternative

shape \(\alpha>0\), rate \(\beta = 1/\theta > 0\)

What you’ll be able to do after this notebook#

  • recognize when a Gamma model makes sense (and when it doesn’t)

  • write the PDF/CDF and compute key moments

  • interpret how \((\alpha, \theta)\) change the shape

  • derive the mean/variance and the log-likelihood

  • sample from a Gamma using NumPy only (Marsaglia–Tsang)

  • use scipy.stats.gamma for inference and simulation

import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio

from scipy import stats
from scipy.special import gammainc, gammaln, psi

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)

rng = np.random.default_rng(42)

2) Intuition & motivation#

What it models#

A classic story:

  • Events arrive according to a Poisson process with rate \(\lambda\).

  • The waiting time between events is exponential.

  • The waiting time until the \(k\)-th event is Gamma.

Even when that exact Poisson-process story isn’t literally true, Gamma is often a good model for positive quantities with right skew.

Typical real-world use cases#

  • Reliability / survival: time-to-failure for components under certain assumptions

  • Queueing / service times: durations that are sums of smaller random delays

  • Insurance / finance: positive claim sizes, severities, waiting times

  • Hydrology / weather: rainfall amounts over fixed intervals

  • Bayesian stats: priors for rates (Poisson/exponential) and precisions

Relations to other distributions#

  • Exponential: \(\text{Gamma}(\alpha=1,\theta)\)

  • Erlang (integer shape): \(\alpha=k\in\{1,2,\dots\}\); sum of \(k\) exponentials

  • Chi-square: if \(X\sim\chi^2_\nu\), then \(X \sim \text{Gamma}(\alpha=\nu/2,\theta=2)\)

  • Additivity: if \(X_i\sim\text{Gamma}(\alpha_i,\theta)\) i.i.d. with shared scale \(\theta\), then \(\sum_i X_i\sim\text{Gamma}(\sum_i\alpha_i,\theta)\)

  • Normal approximation: for large \(\alpha\), Gamma becomes close to normal (by CLT-like behavior)

A practical note: different fields use scale \(\theta\) or rate \(\beta=1/\theta\). Always check which one a library/API expects.

3) Formal definition#

We use the shape–scale parameterization: \(\alpha>0\) (shape), \(\theta>0\) (scale).

Gamma function#

\[ \Gamma(\alpha) = \int_0^{\infty} u^{\alpha-1} e^{-u}\,du \]

PDF#

For \(x>0\): $\( f(x\mid\alpha,\theta) = \frac{1}{\Gamma(\alpha)\,\theta^{\alpha}}\,x^{\alpha-1} e^{-x/\theta} \quad (x>0) \)$

and \(f(x)=0\) for \(x\le 0\).

CDF#

The CDF can be written using the lower incomplete gamma function. A convenient standard form uses the regularized lower incomplete gamma: $\( F(x\mid\alpha,\theta) = P\left(\alpha, \frac{x}{\theta}\right) = \frac{\gamma\left(\alpha, x/\theta\right)}{\Gamma(\alpha)} \)\( where \)\gamma(\alpha, z)=\int_0^z u^{\alpha-1}e^{-u},du$.

def gamma_logpdf(x, alpha, theta):
    """Log-PDF of Gamma(alpha, theta) with support (0, inf)."""
    x = np.asarray(x, dtype=float)
    alpha = float(alpha)
    theta = float(theta)
    if alpha <= 0 or theta <= 0:
        raise ValueError("alpha and theta must be > 0")

    logpdf = np.full_like(x, -np.inf, dtype=float)
    mask = x > 0
    logpdf[mask] = (
        (alpha - 1) * np.log(x[mask]) - x[mask] / theta - gammaln(alpha) - alpha * np.log(theta)
    )
    return logpdf


def gamma_pdf(x, alpha, theta):
    return np.exp(gamma_logpdf(x, alpha, theta))


def gamma_cdf(x, alpha, theta):
    """CDF via the regularized lower incomplete gamma P(alpha, x/theta)."""
    x = np.asarray(x, dtype=float)
    alpha = float(alpha)
    theta = float(theta)
    if alpha <= 0 or theta <= 0:
        raise ValueError("alpha and theta must be > 0")

    cdf = np.zeros_like(x, dtype=float)
    mask = x > 0
    cdf[mask] = gammainc(alpha, x[mask] / theta)
    return cdf


def gamma_entropy(alpha, theta):
    """Differential entropy of Gamma(alpha, theta)."""
    alpha = float(alpha)
    theta = float(theta)
    if alpha <= 0 or theta <= 0:
        raise ValueError("alpha and theta must be > 0")

    return alpha + np.log(theta) + gammaln(alpha) + (1 - alpha) * psi(alpha)


def gamma_mgf(t, alpha, theta):
    """MGF M_X(t) for t < 1/theta; diverges for t >= 1/theta."""
    t = np.asarray(t, dtype=float)
    alpha = float(alpha)
    theta = float(theta)
    if alpha <= 0 or theta <= 0:
        raise ValueError("alpha and theta must be > 0")

    out = np.full_like(t, np.inf, dtype=float)
    mask = t < 1 / theta
    out[mask] = (1 - theta * t[mask]) ** (-alpha)
    return out


def gamma_cf(t, alpha, theta):
    """Characteristic function phi_X(t)."""
    t = np.asarray(t, dtype=float)
    alpha = float(alpha)
    theta = float(theta)
    if alpha <= 0 or theta <= 0:
        raise ValueError("alpha and theta must be > 0")

    return (1 - 1j * theta * t) ** (-alpha)

4) Moments & properties#

For \(X\sim\text{Gamma}(\alpha,\theta)\) (shape–scale):

Quantity

Value

Mean

\(\mathbb{E}[X]=\alpha\theta\)

Variance

\(\mathrm{Var}(X)=\alpha\theta^2\)

Skewness

\(\gamma_1 = \frac{2}{\sqrt{\alpha}}\)

Excess kurtosis

\(\gamma_2 = \frac{6}{\alpha}\) (so kurtosis \(=3+6/\alpha\))

Mode

\((\alpha-1)\theta\) for \(\alpha\ge 1\) (otherwise at 0+)

MGF

\(M_X(t)=(1-\theta t)^{-\alpha}\) for \(t<1/\theta\)

CF

\(\varphi_X(t)=(1-i\theta t)^{-\alpha}\)

Entropy

\(H = \alpha + \log\theta + \log\Gamma(\alpha) + (1-\alpha)\psi(\alpha)\)

Key closure properties:

  • Scaling: if \(X\sim\text{Gamma}(\alpha,\theta)\) and \(c>0\), then \(cX\sim\text{Gamma}(\alpha,c\theta)\).

  • Additivity (shared scale): if \(X_i\sim\text{Gamma}(\alpha_i,\theta)\) independent, then \(\sum_i X_i\sim\text{Gamma}(\sum_i\alpha_i,\theta)\).

alpha, theta = 2.5, 1.3

mean_theory = alpha * theta
var_theory = alpha * theta**2
skew_theory = 2 / np.sqrt(alpha)
excess_kurt_theory = 6 / alpha
entropy_theory = gamma_entropy(alpha, theta)

mean_theory, var_theory, skew_theory, excess_kurt_theory, entropy_theory
(3.25, 4.2250000000000005, 1.2649110640673518, 2.4, 1.9923121739725458)

5) Parameter interpretation#

Shape \(\alpha\)#

  • Controls how concentrated the mass is and how quickly the density rises from 0.

  • If \(\alpha=1\), the Gamma becomes exponential (memoryless).

  • If \(0<\alpha<1\), the density has a singularity at 0 (it blows up as \(x\to 0^+\)) and the distribution is highly right-skewed.

  • As \(\alpha\) increases, the distribution becomes more symmetric and eventually looks close to normal.

Scale \(\theta\) (or rate \(\beta=1/\theta\))#

  • Scale stretches/compresses the distribution horizontally.

  • Mean and standard deviation scale linearly with \(\theta\):

    • \(\mathbb{E}[X]=\alpha\theta\)

    • \(\mathrm{SD}(X)=\sqrt{\alpha}\,\theta\)

A good mental model: \(\alpha\) affects the shape, \(\theta\) affects the units / scale.

# PDF: changing shape alpha (keep theta fixed)
theta_fixed = 1.0
alphas = [0.5, 1.0, 2.0, 5.0]

x = np.linspace(1e-6, 12, 600)
fig = go.Figure()
for a in alphas:
    fig.add_trace(
        go.Scatter(
            x=x,
            y=gamma_pdf(x, a, theta_fixed),
            mode="lines",
            name=f"α={a:g}, θ={theta_fixed:g}",
        )
    )

fig.update_layout(
    title="Gamma PDF: effect of the shape α (scale θ=1)",
    xaxis_title="x",
    yaxis_title="pdf",
)
fig.show()
# PDF: changing scale theta (keep alpha fixed)
alpha_fixed = 2.0
thetas = [0.5, 1.0, 2.0]

x = np.linspace(1e-6, 20, 700)
fig = go.Figure()
for th in thetas:
    fig.add_trace(
        go.Scatter(
            x=x,
            y=gamma_pdf(x, alpha_fixed, th),
            mode="lines",
            name=f"α={alpha_fixed:g}, θ={th:g}",
        )
    )
    fig.add_vline(
        x=alpha_fixed * th,
        line_dash="dot",
        annotation_text=f"mean={alpha_fixed * th:.2f}",
        annotation_position="top",
    )

fig.update_layout(
    title="Gamma PDF: effect of the scale θ (shape α=2)",
    xaxis_title="x",
    yaxis_title="pdf",
)
fig.show()

6) Derivations#

Expectation#

Start from the definition (for \(X\sim\text{Gamma}(\alpha,\theta)\)):

\[ \mathbb{E}[X] = \int_0^\infty x\, f(x\mid\alpha,\theta)\,dx = \frac{1}{\Gamma(\alpha)\theta^\alpha} \int_0^\infty x^{\alpha} e^{-x/\theta}\,dx \]

Substitute \(u=x/\theta\) (so \(x=\theta u\), \(dx=\theta du\)):

\[ \mathbb{E}[X] = \frac{1}{\Gamma(\alpha)\theta^\alpha} \int_0^\infty (\theta u)^{\alpha} e^{-u}\, \theta du = \frac{\theta}{\Gamma(\alpha)} \int_0^\infty u^{\alpha} e^{-u}\,du = \frac{\theta\,\Gamma(\alpha+1)}{\Gamma(\alpha)} = \theta\alpha. \]

So \(\mathbb{E}[X]=\alpha\theta\).

Variance#

Similarly,

\[ \mathbb{E}[X^2] = \frac{1}{\Gamma(\alpha)\theta^\alpha}\int_0^\infty x^{\alpha+1} e^{-x/\theta}\,dx = \theta^2\frac{\Gamma(\alpha+2)}{\Gamma(\alpha)} = \theta^2\alpha(\alpha+1). \]

Then: $\( \mathrm{Var}(X)=\mathbb{E}[X^2] - \mathbb{E}[X]^2 = \theta^2\alpha(\alpha+1) - (\alpha\theta)^2 = \alpha\theta^2. \)$

Likelihood (i.i.d. sample)#

Let \(x_1,\dots,x_n\) be i.i.d. from \(\text{Gamma}(\alpha,\theta)\) with \(x_i>0\).

The log-likelihood is: $\( \ell(\alpha,\theta) = \sum_{i=1}^n \log f(x_i\mid\alpha,\theta) = (\alpha-1)\sum_i \log x_i - \frac{1}{\theta}\sum_i x_i - n\big(\log\Gamma(\alpha) + \alpha\log\theta\big). \)$

MLE for \(\theta\) given \(\alpha\) comes from setting \(\partial\ell/\partial\theta = 0\): $\( \hat{\theta}(\alpha) = \frac{\bar{x}}{\alpha}. \)$

Solving for \(\hat{\alpha}\) jointly requires a numerical root find; one common equation is: $\( \log\hat{\alpha} - \psi(\hat{\alpha}) = \log\bar{x} - \overline{\log x}. \)$

def gamma_loglik(x, alpha, theta):
    x = np.asarray(x, dtype=float)
    alpha = float(alpha)
    theta = float(theta)
    if alpha <= 0 or theta <= 0:
        return -np.inf
    if np.any(x <= 0):
        return -np.inf

    n = x.size
    return (
        (alpha - 1) * np.sum(np.log(x))
        - np.sum(x) / theta
        - n * (gammaln(alpha) + alpha * np.log(theta))
    )


# Quick sanity check: theta-hat given alpha
alpha_true, theta_true = 3.0, 1.5
x = stats.gamma(a=alpha_true, scale=theta_true).rvs(size=2000, random_state=rng)

theta_hat_if_alpha_known = x.mean() / alpha_true
theta_hat_if_alpha_known, theta_true
(1.5106385112629404, 1.5)

7) Sampling & simulation (NumPy-only)#

Two ideas#

  1. Integer shape (\(\alpha=k\in\mathbb{N}\), Erlang):

\[ X = \sum_{j=1}^k E_j,\quad E_j\sim\text{Exponential}(\text{scale}=\theta). \]
  1. General shape (\(\alpha>0\)): use the Marsaglia–Tsang rejection sampler (fast and widely used).

Marsaglia–Tsang (2000) sketch#

For \(\alpha\ge 1\):

  • Set \(d = \alpha - 1/3\), \(c = 1/\sqrt{9d}\).

  • Repeat:

    • draw \(Z\sim\mathcal{N}(0,1)\) and set \(V=(1+cZ)^3\).

    • draw \(U\sim\text{Uniform}(0,1)\).

    • accept \(X=dV\) if a (cheap) squeeze test passes, otherwise accept using the log test.

For \(0<\alpha<1\) we use a reduction:

  • If \(Y\sim\text{Gamma}(\alpha+1,\theta)\) and \(U\sim\text{Uniform}(0,1)\) independent, then \(X = Y\,U^{1/\alpha} \sim \text{Gamma}(\alpha,\theta)\).

Below is a reference implementation using NumPy only (no scipy.stats.gamma.rvs).

def _gamma_rvs_mt_shape_ge_1(alpha, n, rng):
    """Marsaglia–Tsang sampler for Gamma(alpha, 1) with alpha >= 1."""
    d = alpha - 1.0 / 3.0
    c = 1.0 / np.sqrt(9.0 * d)

    out = np.empty(n, dtype=float)
    filled = 0
    while filled < n:
        m = n - filled

        z = rng.normal(size=m)
        v = (1.0 + c * z) ** 3

        valid = v > 0
        if not np.any(valid):
            continue

        z = z[valid]
        v = v[valid]
        u = rng.random(size=v.size)

        # Squeeze step + main acceptance test
        accept = (u < 1.0 - 0.0331 * (z**4)) | (
            np.log(u) < 0.5 * z**2 + d * (1.0 - v + np.log(v))
        )

        accepted = d * v[accept]
        k = accepted.size
        out[filled : filled + k] = accepted
        filled += k

    return out


def gamma_rvs_numpy(alpha, theta=1.0, size=1, rng=None):
    """Sample from Gamma(alpha, theta) using NumPy only.

    Parameters
    ----------
    alpha : float
        Shape parameter (> 0).
    theta : float
        Scale parameter (> 0).
    size : int or tuple
        Output shape.
    rng : np.random.Generator
        Random number generator.
    """
    if rng is None:
        rng = np.random.default_rng()

    alpha = float(alpha)
    theta = float(theta)
    if alpha <= 0 or theta <= 0:
        raise ValueError("alpha and theta must be > 0")

    size_tuple = (size,) if isinstance(size, int) else tuple(size)
    n = int(np.prod(size_tuple))

    if alpha >= 1:
        x = _gamma_rvs_mt_shape_ge_1(alpha, n, rng)
    else:
        # Boost shape to alpha+1 >= 1, then apply the U^(1/alpha) correction
        y = _gamma_rvs_mt_shape_ge_1(alpha + 1.0, n, rng)
        u = rng.random(size=n)
        x = y * (u ** (1.0 / alpha))

    return (x * theta).reshape(size_tuple)


# Smoke test: mean/variance roughly match theory
alpha_test, theta_test = 2.5, 1.3
s = gamma_rvs_numpy(alpha_test, theta_test, size=50_000, rng=rng)
s.mean(), (alpha_test * theta_test), s.var(), (alpha_test * theta_test**2)
(3.2476485440524825, 3.25, 4.233222259786438, 4.2250000000000005)

8) Visualization#

We’ll visualize:

  • the PDF and how it compares to a Monte Carlo histogram

  • the CDF and how it compares to an empirical CDF

We’ll use the NumPy-only sampler from above.

def ecdf(samples):
    x = np.sort(np.asarray(samples))
    y = np.arange(1, x.size + 1) / x.size
    return x, y


alpha_viz, theta_viz = 2.5, 1.3
n_viz = 80_000

samples = gamma_rvs_numpy(alpha_viz, theta_viz, size=n_viz, rng=rng)
x_max = float(np.quantile(samples, 0.995))
x_grid = np.linspace(1e-6, x_max, 600)

# PDF + histogram
fig = go.Figure()
fig.add_trace(
    go.Histogram(
        x=samples,
        nbinsx=70,
        histnorm="probability density",
        name="Monte Carlo samples",
        opacity=0.55,
    )
)
fig.add_trace(
    go.Scatter(
        x=x_grid,
        y=gamma_pdf(x_grid, alpha_viz, theta_viz),
        mode="lines",
        name="Theoretical PDF",
        line=dict(width=3),
    )
)
fig.update_layout(
    title=f"Gamma(α={alpha_viz:g}, θ={theta_viz:g}): histogram vs PDF",
    xaxis_title="x",
    yaxis_title="density",
    bargap=0.02,
)
fig.show()

# CDF + empirical CDF
xs, ys = ecdf(samples)

fig = go.Figure()
fig.add_trace(
    go.Scatter(
        x=x_grid,
        y=gamma_cdf(x_grid, alpha_viz, theta_viz),
        mode="lines",
        name="Theoretical CDF",
        line=dict(width=3),
    )
)
fig.add_trace(
    go.Scatter(
        x=xs[::100],
        y=ys[::100],
        mode="markers",
        name="Empirical CDF (subsampled)",
        marker=dict(size=5),
    )
)
fig.update_layout(
    title=f"Gamma(α={alpha_viz:g}, θ={theta_viz:g}): empirical CDF vs CDF",
    xaxis_title="x",
    yaxis_title="CDF",
)
fig.show()

# Monte Carlo moment check
sample_mean = samples.mean()
sample_var = samples.var()
theory_mean = alpha_viz * theta_viz
theory_var = alpha_viz * theta_viz**2

sample_mean, theory_mean, sample_var, theory_var
(3.2598087590265608, 3.25, 4.286106165978027, 4.2250000000000005)

9) SciPy integration (scipy.stats.gamma)#

SciPy’s Gamma distribution is scipy.stats.gamma.

  • Shape is passed as a.

  • Scale is passed as scale.

  • There is also a loc parameter (a shift). For the standard Gamma distribution, use loc=0.

So the mapping is:

\[ X \sim \text{Gamma}(\alpha, \theta) \quad\Longleftrightarrow\quad \texttt{stats.gamma(a=alpha, loc=0, scale=theta)}. \]

If you’re using a rate \(\beta\) instead: scale = 1 / beta.

alpha_true, theta_true = 3.0, 1.5

dist = stats.gamma(a=alpha_true, loc=0, scale=theta_true)

x = np.linspace(0, 15, 500)
pdf = dist.pdf(x)
cdf = dist.cdf(x)

samples_scipy = dist.rvs(size=5000, random_state=rng)

# MLE fit (note: constrain loc=0 to match the usual Gamma support)
a_hat, loc_hat, scale_hat = stats.gamma.fit(samples_scipy, floc=0)
a_hat, loc_hat, scale_hat
(2.970249528831039, 0, 1.5243399321816582)

10) Statistical use cases#

A) Hypothesis testing#

  • Goodness-of-fit: do Gamma parameters explain the data?

  • Model comparison: e.g. exponential vs general Gamma (is \(\alpha=1\) plausible?)

B) Bayesian modeling#

Gamma is a standard conjugate prior for rates:

  • Poisson likelihood (counts) with unknown rate

  • Exponential likelihood (waiting times) with unknown rate

C) Generative modeling / hierarchical models#

Gamma is often used as a latent positive variable (rates, precisions, intensities). A classic example is a Poisson–Gamma mixture, which creates overdispersed count data (marginally negative binomial).

# A) Hypothesis testing example: Exponential (alpha=1) vs Gamma (alpha free)

# Generate data from a Gamma alternative
alpha_data, theta_data = 2.0, 2.0
data = stats.gamma(a=alpha_data, scale=theta_data).rvs(size=2500, random_state=rng)

# Fit the unrestricted Gamma
a_hat, loc_hat, scale_hat = stats.gamma.fit(data, floc=0)

# Null model: alpha = 1 (exponential). MLE for theta is just the sample mean.
theta0_hat = data.mean()

ll_null = gamma_loglik(data, alpha=1.0, theta=theta0_hat)
ll_alt = gamma_loglik(data, alpha=a_hat, theta=scale_hat)

lr_stat = 2 * (ll_alt - ll_null)
p_value_lrt = stats.chi2.sf(lr_stat, df=1)

# Goodness-of-fit (KS) for the fitted Gamma
D_ks, p_value_ks = stats.kstest(data, "gamma", args=(a_hat, 0, scale_hat))

lr_stat, p_value_lrt, D_ks, p_value_ks
(601.8915919589344,
 6.491780106731331e-133,
 0.01017051215045639,
 0.9558863897485597)
# B) Bayesian modeling: Gamma prior for a Poisson rate
# Use rate parameterization for the prior/posterior: Gamma(α, β) with mean α/β.

alpha0, beta0 = 2.0, 1.0
lambda_true = 3.0
n = 25

y = rng.poisson(lam=lambda_true, size=n)

alpha_post = alpha0 + y.sum()
beta_post = beta0 + n

# Convert rate β to scale θ=1/β for our Gamma(alpha, theta) helper functions
lam_grid = np.linspace(1e-6, max(10, 3 * lambda_true), 600)
prior_pdf = gamma_pdf(lam_grid, alpha0, theta=1 / beta0)
post_pdf = gamma_pdf(lam_grid, alpha_post, theta=1 / beta_post)

fig = go.Figure()
fig.add_trace(
    go.Scatter(
        x=lam_grid,
        y=prior_pdf,
        mode="lines",
        name=f"prior Γ(α={alpha0:g}, β={beta0:g})",
        line=dict(width=3),
    )
)
fig.add_trace(
    go.Scatter(
        x=lam_grid,
        y=post_pdf,
        mode="lines",
        name=f"posterior Γ(α={alpha_post:g}, β={beta_post:g})",
        line=dict(width=3),
    )
)
fig.add_vline(x=lambda_true, line_dash="dash", line_color="black", annotation_text="true λ")
fig.update_layout(
    title="Gamma–Poisson conjugacy: prior → posterior on λ",
    xaxis_title="λ",
    yaxis_title="density",
)
fig.show()

y.sum(), y.mean(), (alpha_post / beta_post)
(68, 2.72, 2.6923076923076925)
# C) Generative modeling: Poisson–Gamma mixture (overdispersed counts)

alpha_latent, theta_latent = 5.0, 0.6  # latent lambda ~ Gamma(alpha, theta)
n_obs = 20_000

lambdas = gamma_rvs_numpy(alpha_latent, theta_latent, size=n_obs, rng=rng)
counts_mixture = rng.poisson(lam=lambdas)

mean_counts = counts_mixture.mean()
counts_poisson = rng.poisson(lam=mean_counts, size=n_obs)

max_k = int(np.quantile(counts_mixture, 0.995))
bin_spec = dict(start=-0.5, end=max_k + 0.5, size=1)

fig = go.Figure()
fig.add_trace(
    go.Histogram(
        x=counts_mixture,
        xbins=bin_spec,
        histnorm="probability",
        name="Poisson–Gamma mixture",
        opacity=0.6,
    )
)
fig.add_trace(
    go.Histogram(
        x=counts_poisson,
        xbins=bin_spec,
        histnorm="probability",
        name="Poisson (same mean)",
        opacity=0.6,
    )
)
fig.update_layout(
    title="Overdispersion from a Poisson–Gamma mixture",
    xaxis_title="count",
    yaxis_title="probability",
    barmode="overlay",
)
fig.show()

counts_mixture.mean(), counts_mixture.var(), counts_poisson.var()
(2.98515, 4.7931294775, 2.9583533975000003)

11) Pitfalls#

  1. Parameterization confusion

  • Some sources use rate \(\beta\) (larger \(\beta\) means smaller values), others use scale \(\theta=1/\beta\).

  • scipy.stats.gamma uses scale (not rate).

  1. loc in SciPy

  • scipy.stats distributions often include a shift loc by default.

  • For the standard Gamma with support \((0,\infty)\), you typically want loc=0.

  1. Behavior near zero

  • If \(0<\alpha<1\), the PDF diverges as \(x\to 0^+\). Plots should avoid including exactly \(x=0\).

  1. Numerical stability

  • Computing the PDF directly can underflow/overflow for extreme parameters.

  • Prefer logpdf (and gammaln) when doing inference or optimization.

  1. Invalid parameters / data

  • Gamma requires \(\alpha>0\), \(\theta>0\), and data \(x_i>0\).

  • If your data include zeros or negatives, consider a shifted model (loc), a hurdle model, or a different distribution.

12) Summary#

  • Gamma is a continuous distribution on \((0,\infty)\) with parameters shape \(\alpha\) and scale \(\theta\).

  • It naturally models waiting times and sums of exponentials, and it’s a workhorse for positive skewed data.

  • Key formulas: \(\mathbb{E}[X]=\alpha\theta\), \(\mathrm{Var}(X)=\alpha\theta^2\), \(M_X(t)=(1-\theta t)^{-\alpha}\).

  • Sampling can be done efficiently with Marsaglia–Tsang; SciPy provides stats.gamma for pdf, cdf, rvs, and fit.

  • Always watch for parameterization (scale vs rate), loc, and numerical stability near \(x=0\) / extreme parameters.

References#

  • Marsaglia, G. and Tsang, W. W. (2000). A Simple Method for Generating Gamma Variables.

  • SciPy documentation: scipy.stats.gamma

  • Any standard mathematical statistics text (Gamma function, incomplete gamma, conjugacy)